CHL5230H-Appplied Machine Learning for Health Data

Instructor: Zahra Shakeri– Fall 2023
Dalla Lana School of Public Health-University of Toronto
Datathon #2


Datathon Description and Instructions

Datathon Context and Objective

The accurate prediction of mortality in the first 24 hours following admission to an Intensive Care Unit (ICU) is of paramount importance in clinical medicine. Early and precise mortality risk assessment allows for effective resource allocation, ensuring that critical medical equipment and skilled personnel are available for the patients who need them the most. Additionally, it aids in strategic planning, allowing medical teams to prioritize interventions, tailor treatment plans, and make informed decisions about the potential benefits of aggressive versus palliative care strategies. Furthermore, early mortality predictions provide invaluable information to families, facilitating difficult conversations and helping loved ones set realistic expectations, prepare for potential outcomes, and make time-sensitive decisions about end-of-life care. In an environment where every moment counts, leveraging data from the first 24 hours of ICU admission for mortality prediction not only optimizes clinical interventions but also provides compassionate guidance to families during their most challenging times. Recognizing this, this datathon aims to explore the power of Machine Learning models to predict mortality using real-world data from that critical first day in the ICU, all with the goal of enhancing patient care and supporting families in their tough moments.

Datasets Information

This dataset is provided in collaboration with MIT’s GOSSIS community initiative and has received privacy certification from the Harvard Privacy Lab. It includes records from over 91,000 intensive care unit (ICU) visits at various hospitals, covering an entire year. What makes this dataset truly unique is its global scope, as it is part of a collaborative effort connecting healthcare institutions in Argentina, Australia, New Zealand, Sri Lanka, Brazil, and over 200 hospitals in the United States.

The datasets and a detailed data dictionary can be found at Modules/Datathon #4, and they will be provided at 6:45 pm on Tuesday, October 31, 2023 .

Instructions for Submission

You are encouraged to discuss your work with your teammates and other teams and can use online and offline resources. However, all members of your team should make substantial, meaningful contributions to your submission, ensuring fairness to all participating teams in this datathon. Teams must submit the following materials by the 8:00 PM in-class deadline and the final deadline at 2:00 PM. It is advisable for teams to work consistently from the outset on deliverables rather than attempting to complete them all within the last hour. You should begin work on the deliverables at least three days before the deadline.

Components of Submission

1. Low-fidelity Prototype (In-class Submission)

The first phase of this Datathon involves collaborative efforts among students, aimed at transforming the provided datasets into actionable insights. Teams should formulate research questions and outline their data analysis plans, followed by submitting a low-fidelity prototype of their solution to Assignments/Datathon#4/Low-fidelity Prototype. Please adhere to the naming convention outlined later in this document when naming your one-page PDF submission for today.

Every team is required to submit their low-fidelity prototype through Quercus by 8:00 PM on October 31st, 2023. A successful submission should include a clear and legible list of research questions that you plan to address using the provided datasets. Additionally, provide a detailed plan specifying the analysis methods (e.g. machine learning) you intend to employ for addressing these questions. Ensure that each research question corresponds to its respective analysis plan.

Please note that you are not obligated to finalize your solution or research questions at this stage. If you come up with a better idea during the week, feel free to update your plan. The primary goal of the low-fidelity milestone is to initiate the brainstorming phase of a data science project, which is typically the initial and most critical phase. It allows you to see how the project’s direction may evolve during your analysis.

2. A High-fidelity Prototype

All teams are expected to submit their analysis results and deliver brief presentations (2 minutes for the presentation, followed by 1 minute for questions) consisting of a minimum of 2 and a maximum of 3 slides. The purpose of these presentations is to guide your instructor and TA(s) on how you leveraged the available data to address the research question you formulated.

During your presentation, cover essential elements, including meaningful results, the data analysis process, challenges encountered, and key findings. While you have the flexibility to decide the presentation’s content, it should focus on conveying a clear understanding of the analytical process, findings, and conclusions. In essence, the presentation should provide a condensed version of the written report.

To allow the TA to prepare teams’ presentations effectively, it is imperative that teams finalize their submissions by 2:00 PM on November 14, 2023.

3. A Written Report

Teams are required to compile a report that details the steps taken to address their proposed question or prompt. While there is not a prescribed format for the report, it should encompass key sections such as:

  • Introduction: Explain the questions you aimed to answer with the data and their significance.
  • Data Engineering Process: Describe how you cleaned and prepared the data and specify the datasets used.
  • Analysis: Outline the learning and analysis techniques employed, along with the rationale behind their selection.
  • Findings: Present your discoveries and insights.
  • Conclusion: Summarize what health practitioners can infer from your team’s work.
  • Individual Contributions: Highlight the contributions of each team member throughout the entire process.
  • Code and Presentation: Host your Datathon materials, including notebooks and datasets, on GitHub. Share the GitHub project link in the report for easy access by the TA. Also, utilize Google Presentation to host your presentation and provide the public link in the report.

Note: When submitting your report to Quercus, please consolidate all components into one PDF file and include links to other relevant elements within the report. Name your file following the format: Team Number-CHL5230-F23 (e.g., 25-CHL5230-F23.PDF). Submissions not adhering to this naming convention will not be considered for grading. Additionally, ensure that you include your team number and the names of all team members in your report.

At a minimum, the report should cover the question addressed, findings, the data analysis process, and a conclusion. The report must not exceed two pages in length. While the code should be functional and produce the reported results, it will not be evaluated based on code quality.

Ensure that all materials are submitted by 2:00 pm, November 14rd. Unfortunately, no late submissions will be accepted.

This Datathon is pretty free-form! This is intentional; projects you work on in industry will rarely be very specific. Please feel free to show early results to me to get some feedback you can use to ensure a successful submission!

Important Dates

Component Due Time Where to Submit?
Data Availability October 31, 6:45 pm Modules/Datathon #4
Low-fidelity Prototype October 31, 8:00 pm Assignments/Datathon #4/Low-fidelity Prototype
Written Report November 14, 2:00 pm Assignments/Datathon #4/Written Report